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Abstract 

This paper considers the responses of 115 school students to two problems 
based on information provided in two-way tables. In each case the ques¬ 
tion asks if one of the variables involved depends on the other. Contex¬ 
tual knowledge might suggest a dependent relationship in both but in one 
problem the data show independence while in the other the data imply an 
inverse relationship. A wide range of solution strategies illustrates the cog¬ 
nitive complexity dealing with information in two-way tables. 


Introduction 

Two-way tables have an identity crisis when it comes to their placement 
in curriculum documents. Are they part of statistics or part of probability? 
In the Australian Curriculum: Mathematics (Australian Curriculum, As¬ 
sessment and Reporting Authority [ACARA], 2013) they fall under the 
Chance sub-strand for Year 8 where students “represent ... events in two- 
way tables ... and solve related problems” (p. 54). The elaborations for 
this outcome throw no further light on the types of problems to be solved, 
except that they involve probability. The Guidelines for Assessment and In¬ 
struction in Statistics Education (GAISE) Report, endorsed by the American 
Statistical Association (Franklin, Kader, Mewborn, Moreno, Peck, Perry, & 
Scheaffer, 2007), uses an example involving a two-way table to illustrate 
the recognition of an “association between two categorical variables” (p. 
62). This description is a statistical embodiment. The interpretation of two- 
way tables depends very much on the contexts within which they are placed 
and the ways in which questions are posed. Gigerenzer (2002) provides a 
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comprehensive contextual background on the importance of being able to 
interpret conditional information to guide decision-making with regard to 
risk, particularly with regard to medical testing for disease. He illustrates 
the link between probability and statistics (e.g., p. 108) by stating a testing 
outcome first in terms of conditional probabilities in percentages and sec¬ 
ond in terms of statistical natural frequencies. 

In this study a two-way table is defined as a bivariate frequency table 
with two categorical variables each taking two values. One variable is ar¬ 
ranged vertically and the other horizontally. Internally the table consists of 
four cells containing the frequency count of data satisfying each of the com¬ 
binations of two values for the two variables. The row and column totals on 
the two boundaries of the table are sums of the counts for the two values of 
the individual variables. A grand total of the data count is in the lower right 
cell. If the values of the two variables are A and not A for one variable, and 
B and not B for the other variable, with internal frequencies of a, b, c, d, a 
two-way table has the form shown in Figure 1. 


Variable 1 



A 

not A 

Total 

B 

a 

b 

a + b 

not B 

c 

d 

c + d 

Total 

a + c 

b + d 

a + b + c + d 


Figure 1. Format for the two-way tables in this study. 


Although a few studies had been carried out based on interpreting two- 
way tables by psychologists and economists as early as the 1960s (e.g., 
Smedslund, 1963) and Gigerenzer drew attention to issues in the 1990s 
(e.g., Gigerenzer, 1996), it was the work of Batanero, Estepa, Godino and 
Green (1996) that brought the topic to the attention of mathematics educa¬ 
tors. They presented two-way tables with frequency data displaying three 
types of relationship: a “positive” direct relationship, a “negative” inverse 
relationship, or independence. Analysis of problems presented in this way 
may avoid engaging with Bayes’ Theorem but requires a deep understand¬ 
ing and application of proportional reasoning and hence can be a challenge 
to middle school students. Although the phrase proportional reasoning is 
not used in Year 8 of The Australian Curriculum: Mathematics (ACARA, 
2013), students are expected to solve problems with percentages, rates, and 
ratios (p. 52). 

Two of the problems from Batanero et al. (1996) are the basis of this 
study and are presented in Figure 2. The Lung Disease problem asks if, 
based on the data presented, Lung Disease is dependent on smoking. Al- 
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though the expectation from students’ experience is likely to be “yes,” the 
data in the table show independence. The Indigestion problem asks if, based 
on the data, elderly people’s indigestion depends on taldng a certain drug. 
Although this context is not likely to be as familiar to students as lung dis¬ 
ease and smoking, students are likely to have encountered media stories 
about drugs causing the side effect of indigestion. The data in this table, 
however, point to the inverse conclusion, that in fact the drug may result in 
no indigestion. 


Lung Di 

The folio 

sease: 

wing information is from a survey about smoking and lung disease among 250 people. 


Lung disease 

No lung disease 

Total 



Smoking 

90 

60 

150 


No smoking 

60 

40 

100 


Total 

150 

100 

250 

Using this information, do you think that for this sample of people lung disease depended on 
smoking? Explain your answer. 

Indigestion: 

The following information is from a study to assess if a certain drug produces indigestion (stomach 
trouble) in elderly people. 



Indigestion 

No Indigestion 

Total 


Drug taken 

8 

8 

16 

No drug taken 

7 

i 

8 

Total 

15 

9 

24 

Using this information, do you think that the elderly people’s indigestion depends on taking the 
drug? Explain your answer. 


Figure 2. Two problems from Batanero et al. (1996). 


Although in the early 1990s, A National Statement on Mathematics for Aus¬ 
tralian Schools (Australian Education Council [AEC], 1991) mentioned 
associations for bivariate data being “weak, strong, positive or negative,” 
there has not been a great deal of research with school students on prob¬ 
lems based on two-way tables. From a teaching perspective of considering 
conditional probability, Gigerenzer and Hoffrage (1995), Pfannkuch, Seber, 
and Wild (2002) and Watson (1995) suggested two-way tables be used as 
an easier method of solving Bayes’ Theorem problems than the traditional 
formula approach. More recently, Watson (2011) employed them in untan¬ 
gling some complex claims found in the media. From a research perspec¬ 
tive, Watson and Callingham (2005) used the Lung Disease problem as a 
survey item in a study confirming their statistical literacy hierarchy (Watson 
& Callingham, 2003), as did Watson, Callingham, and Donne (2008) in 
documenting students’ responses and teachers’ attempts to remediate incor¬ 
rect answers. The problem was also used in in-depth interviews with teach¬ 
ers related to various aspects of pedagogical content knowledge (Watson & 
Nathan, 2010). To the authors’ knowledge, the Indigestion problem has not 
been used in published research since the work of Batanero et al. (1996). 
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Although Estrada, Roca, and Batanero (2006) considered two-way tables, 
they were investigating conditional probability not the dependence or inde¬ 
pendence of variables. 

Several issues arise in relation to these two problems for middle school 
students. Although often in assessment situations, correct final answers are 
rewarded with full marks, for these two problems it is possible to give the 
correct answer without complete analysis of all of the data provided in order 
to eliminate alternate solutions. In a two-way table, for example, a student 
might focus on one large number to claim a positive relationship, without 
checking to see if there are other numbers in the table that would challenge 
this conclusion. Hence it is important to provide a structural framework for 
responses that rewards justifications that employ the necessary elements of 
the problem for a valid conclusion, providing a set of levels of response of 
increasing quality. For problems of the type used here, it is also important 
to document the interference that occurs from the context set and the beliefs 
held about it by the person answering. Further, because there is a differ¬ 
ence in the type of association in the two problems (no relationship and an 
inverse relationship) it is of interest to explore the association in levels of 
response between the two types of relationship. 

Because of the presence of the underlying concepts of conditional and 
proportional reasoning associated with the two problems in the middle 
school curriculum, the question arose as to whether the performance of stu¬ 
dents improved consistently across the years of schooling. 

These issues led to the following research questions. 

1. To what extent can a hierarchical structure explain understanding 
displayed in middle-school students’ solutions to the problems? 

2. What degree of interference occurs from students’ previous beliefs 
about the contexts of the problems? 

3. What is the degree of association between levels of response to the 
two problems although they display different types of relationship? 

4. Does performance at the cohort level change over the middle school 
years on these problems? 


Method 


Sample 

The students in this project were students of the teachers participating in 
a 3-year professional development project in statistics education for school 
mathematics teachers (StatSmart) (Callingham & Watson, 2008, 2011). The 
students had been part of the project with their teachers for at least two 
years and had previously completed three designed longitudinal surveys. At 
the end of the project the final survey for this particular group of students 
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included the Indigestion problem, not included in previous surveys, as well 
as the Lung Disease problem, which had been a linking item on all previous 
surveys. There was no requirement of the project that the teachers teach 
the content associated specifically with these problems, and there was no 
evidence from other teacher sources that teaching of the topic had occurred. 
The numbers of students in each year level are given in Table 1. For analysis 
purposes, data were combined in pairs of years for Years 6/7 and Years 8/9. 
Year 7 was in the elementary school for students in one Australian state, and 
in secondary school for the other two and there were low numbers in Year 6. 
The students were from three Australian states (Tasmania, 69; Victoria, 23; 
South Australia, 23). Fifty-one percent were male and 49% female. 


Year 

6 

7 

8 

9 

10 

N 

8 

20 

21 

23 

42 


Analysis 

Repeated reading of the responses to the Indigestion problem led to a 
hierarchical coding by the first author, which was then applied to the Lung 
Disease problem responses. The second author independently confirmed 
the coding and any discrepancies were discussed and agreement reached. 
This approach led to a slightly expanded coding scheme compared with that 
used in previous studies (Watson & Callingham, 2005). The coding scheme, 
shown in Table 3 has six levels of response. These followed the hierarchy 
identified by Watson and Callingham (2003). This Statistical Literacy hier¬ 
archy is shown in Table 2. It is characterised by both engagement with con¬ 
text and the application of statistical skills, both of increasing complexity. 

Table 3 shows the final codings used for both problems. At Code 1 idio¬ 
syncratic responses were likely to have made sense to the student but were 
incorrect. At Code 2 knowledge about the context in which the problem was 
set from the student’s perspective was presented in a sequence to support a 
belief in a conclusion. Code 3 responses related together knowledge about 
study design and questioned issues such as sample size, data collection, 
and lurking variables, but provided no final justified answer. Such an open- 
ended argument demonstrated bringing relevant know ledge of the context 
together with some qualitative application of statistical skills. At Code 4 
only a single number or one variable was used to suggest a response but 
there was clear understanding of the context. Code 5 responses used two 
numbers from different cells to draw a conclusion but without using enough 
information from elsewhere in the table to eliminate other possibilities. 
There was no use of any form of proportional representation. At Code 6, 
there was clear evidence of proportional reasoning, using data from four of 
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Table 2. Statistical Literacy Hierarchy (from Watson and Callingham, 2003) 



Level 

Characterization 

Level 6 

Critical 

mathematical 

Critical questioning engagement with context; use of 
proportional reasoning; appreciation of uncertainty in making 
predictions; interpretation of subtle language. 

Level 5 

Critical 

Critical questioning engagement with context without use of 
proportional reasoning; appropriate use of terminology; 
qualitative use of chance and appreciation of variation. 

Level 4 

Consistent 

non-critical 

Appropriate but non-critical engagement with context; 
multiple aspects of terminology use; limited appreciation of 
variation only in chance settings; some statistical skills such 
as mean, simple probabilities; and use of graphs. 

Level 3 

Inconsistent 

Selective engagement with context; appropriate recognition of 
conclusions with little justification; qualitative use of 
statistical ideas. 

Level 2 

Informal 

Colloquial or informal engagement with context often 
reflecting intuitive beliefs; single elements of complex 
terminology; basic one-step calculations. 

Level 1 

Idiosyncratic 

Idiosyncratic engagement with context; tautological use of 
terminology; simple mathematical skills such as reading a 
single value in a table. 


the cells related together with the variables to justify a correct solution. The 
percentage of Code 2 responses reflected the degree to which the contexts 
influenced the answers to the questions. 

All students supplied assessable answers to both questions and scores 
were assigned to each according to the coding in Table 3. Indicative cor¬ 
relation coefficients were used to explore the relationship between the re¬ 
sponses to the two questions. Using the means and standard deviations for 
each year group, 95% confidence intervals were calculated for the differ¬ 
ences in means for each question. These were used to gauge development 
and sustainability across the middle school years. 


Results 

Research Questions 1 (Hierarchical structure) and 2 (Interference 
of context) 

The percentages of response levels for each of the three year groups are 
given in Table 4. These are then exemplified with quotes from students 
across the years, with note taken of Code 2, reflecting students’ prior beliefs 
about the contexts of the problems. 


- 6 - 





Table 3. Coding for student responses to two-way table problems. 


Code Description 

(with reference to table at the right) 



Category A 

Not-Category A 


Category B 

Internal Cell 

Internal Cell 

Row Total 

Not-Category B 

Internal Cell 

Internal Cell 

Row Total 


Column Total 

Column Total 

Grand total 


6 Provides evidence using all 4 internal cells or a 
pair of inside cells with the corresponding row or 
column totals. 


5 Provides evidence from two internal cells or one 
internal cell and a corresponding row or column 
total. 


4 Provides evidence explicitly from one internal 
cell or from one of the variables represented in 
either row or column totals (that is, recognizing 
only one variable). 


Lung Disease : to conclude no association of 
smoking and lung disease. 

Indigestion: to conclude the inverse effect, less 
indigestion with the drug. 

Lung Disease: usually to conclude dependence 
of disease on smoking without considering if 
other cells would show the same relationship 
(some additive comparison of the two cells). 
Indigestion: to conclude any of three 
possibilities (positive, negative, or no 
relationship). 

Lung Disease: May use “most” or “more” but 
without reference to other possibilities. 
Indigestion: Often focuses on the single person 
in one cell. 


3 Critical analysis of potential survey methods, the limitations of the methods used to collect the data, 
and/or the sample size. 

2 Conclusion based on knowledge of the context or opinion, but not on the data provided. 


1 Idiosyncratic attempt: numbers in contradiction to own claim or impossible to decipher. 
0 No response or no justification for a “yes” or “no.” 


Table 4. Percentage of levels of response across your levels 


Year 

Code 

0 

1 

2 

3 

4 

5 

6 

N 

Mean 

Std 

Dev 

Lung Disease 

6/7 

14% 

4% 

43% 

7% 

11% 

18% 

4% 

28 

2.64 

1.73 

8/9 

5% 

7% 

18% 

16% 

7% 

20% 

27% 

44 

3.84 

1.89 

10 

2% 

5% 

24% 

5% 

7% 

26% 

33% 

42 

4.21 

1.83 

Total 

All 

Years 

6% 

5% 

26% 

10% 

8% 

22% 

24% 

114 

3.66 

1.92 

Indigestion 

6/7 

32% 

0% 

14% 

0% 

18% 

14% 

21% 

28 

3.00 

2.43 

8/9 

7% 

5% 

0% 

16% 

5% 

39% 

30% 

44 

4.41 

1.77 

10 

10% 

7% 

2% 

10% 

14% 

26% 

13% 

42 

4.29 

1.89 

Total 

All 

Years 

14% 

4% 

4% 

10% 

11% 

28% 

29% 

114 

4.02 

2.06 
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Responses coded 0 were either blank or gave a response with no justifica¬ 
tion. 

• [Lung] Don’t smoke. [Year 7] 

• [Indig] I’d say yes. [Year 10] 

Code 1 responses made idiosyncratic claims or used numbers that could not 
be interpreted. 

• [Lung] It doesn’t depend on smoking. But from smoking and non 
smoking. The difference is greater with 90 getting lung disease and 60 
not that’s 30 people difference and non smoking is 20 people differ¬ 
ence. [Year 10] 

• [Lung] 10 more people with L/D were smoking. Maybe? [Year 9] 

• [Indig] No because the numbers for indigestion and no indigestion are 
the same and no drug taken with indigestion is the same as drug taken 
with digestion. [Year 9] 

• [Indig] It depends if the drug is taken because there were fairly even 
numbers for having indigestion. There were 8 people who took the 
drug and had indigestion and 7 who didn’t take the drug and still had 
indigestion. So the drug is really the cause. It’s just chance. [Year 10] 

Responses that were based on beliefs about the context, not considering the 
data presented were coded 2. With reference to Research Question 2, for 
the Lung Disease problem this was marginally the most popular response, 
whereas there was virtually no influence on the Indigestion problem. 

• [Lung] No you could get from car smoke so anyone could get it. [Year 6] 

• [Lung] Smoldng does cause lung disease 1 think it [is] now a fact. [Year 7] 

• [Lung] Yes and No. Because people can still get lung disease without 
being a smoker. [Year 8] 

• [lndig] No not really, anything can cause stomach trouble, you don’t 
have to take something for it to happen. But in some cases it does and 
has affected stomach trouble, I guess it’s just how well your stomach 
can take to things. [Year 10] 

• [lndig] No. About half the people that took the drug got indigestion so 
scientists could look at why that is. But out of 7 people that did not take 
the drug only 1 person didn’t get indigestion. [Year 7] 

Some responses criticised the design of the study (Code 3). 

• [Lung] No. The reason why 1 think this is because the sample size is 
much too small to make a decision. [Year 6] 

• [Lung] You cannot make an accurate assumption as the same amount 
of people weren’t tested. [Year 9] 

• [lndig] No, they could have got more people to even it out and then it 
would be the same. [Year 8] 



• [lndig] You can’t make an accurate assumption. It’s not a clear rep¬ 
resentation as the amount surveyed n [who took] the drug taken [sic] 
is different from that of those who didn’t take the drug. Also, this is 
only a small amount of people from the total population and you don’t 
know any other factors such as whether they all ate the same food etc. 
[Year 9] 


The responses that used data from one cell or only one variable (e.g., row or 
column totals) to reach a conclusion were coded 4. 

• [Lung] Yes it depends on smoking because 90 people with lung disease 
is smoking and that is most of the people out of this graph. 



Lung Dis 

NoLgDis 

Total 

Smoking 

90 

60 

150 

NoSmkng 

60 

40 

100 

Total 

150 

100 

250 


• [Lung] No smoking has less total. 


[Year 6] 



Lung Dis 

NoLgDis 

Total 

Smoking 

90 

60 

150 

NoSmkng 

60 

40 

100 

Total 

150 

100 

250 


[Year 10] 

[Indig] Yes because with no drug taken there is only 1 person with no 
indigestion. 



Indigesfn 

No Indig 

Total 

Drug 

8 

8 

16 

No drug 

7 

1 

8 

Total 

15 

9 

24 


[Year 7] 

[Indig] More people take the drug and less people take nothing. The 
elderly are more dependent on drugs. 



Indigesfn 

No Indig 

Total 

Drug 

8 

8 

16 

No drug 

7 

1 

8 

Total 

15 

9 

24 


[Year 9] 


Code 5 responses employed information from two cells to reach a conclu¬ 
sion of “Yes,” “No,” or “no relationship.” Although the conclusion may 
have been technically correct the response did not consider other cells that 
might have contradicted the conclusion. 

• [Lung] Yes because 90 smokers had lung disease but 60 non-smokers 
had it. 



Lung Dis 

NoLgDis 

Total 

Smoking 

90 

60 

150 

NoSmkng 

60 

40 

100 

Total 

150 

100 

250 


[Year 7] 
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[Lung] The majority of people that smoke had lung cancer, so yes! 



Lung Dis 

NoLgDis 

Total 

Smoking 

90 

60 

150 

NoSmkng 

60 

40 

100 

Total 

150 

100 

250 


[Year 10] 

[Indig] Yes I do. When the drugs are taken they have more stomach 
problems than when they don’t take the drugs. 



Indigesfn 

No Indig 

Total 

Drug 

8 

8 

16 

No drug 

7 

1 

8 

Total 

15 

9 

24 


[Year 8] 

[Indig] With the people which haven’t taken the drug most of them 
have indigestion so 1 think it’s wise to take the drug. 



Indigestn 

No Indig 

Total 

Drug 

8 

8 

16 

No drug 

7 

1 

8 

Total 

15 

9 

24 


[Year 7] 


At the highest level (Code 6), responses used all four internal cells or two 
internal cells and the associated row/column totals to reach the appropri¬ 
ate conclusion. For the Indigestion problem, some students said “No,” not 
meaning “no relationship” but meaning “no the opposite to what would be 
expected.” 

• [Lung] No, they have the same ratio. [Year 8] 

• [Lung] 90/150= 180/300. 60/100= 180/300. For this sample of people, 
I don’t think lung disease depended on smoking. [Year 9] 

• [Lung] No, because the same percentage of people had lung disease in 
both tests. [Year 9] 

• [Lung] No. In both, 3/5 people will have lung disease [Year 10] 

• [Indig] If you don’t take drugs it is rare to get no indigestion but if you 
do it is a 50-50 chance. [Year 7] 

• [Indig] I think that it actually would help prevent indigestion as 1/2 of 

the people who had taken the drug got indigestion and 7/8 of the people 
who hadn’t taken the drug got indigestion. 1/2 = 4/8 7/8 > 4/8 [Year 

9] 

• [Indig] When the drug is taken it’s a 50-50 chance of getting indiges¬ 
tion but when there is no drug is a 1 in 8 chance of not getting it. [Year 

9] 

• [lndig] No. People who take the drug have a lower ratio to people who 
don’t. [Year 10] 


Research Question 3 (Association of responses) 

For each year level grouping, the correlation of response levels between 
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the two questions is given in Table 5. Only for Year 10 was the relation¬ 
ship statistically significant. The consistency of performance across the two 
problems increases over the year groupings. In Year 10, although the aver¬ 
age performances were not significantly different from those in Years 8/9, it 
was much more consistent across the two problems. 


Table 5. Correlation between problems for year groups 


Year 

6/7 (n = 28) 

8/9 (; n = 44) 

10 (n = 42) 

Correlation 

- .009 (ns) 

. 145 (ns) 

.553 (p < 0.001) 


Research Question 4 (Cohort change) 

The 95% confidence intervals for the differences in means between the 
three pairs of year group for each question are shown in Table 6. For both 
questions the differences between Yr 6/7 and Yr 8/9 and between Yr 6/7 and 
Yr 10 were statistically significant, whereas the differences between Yr 8/9 
and Yr 10 were not. 


Table 6. Confidence intervals for mean differences between each pair of year groups for each 
question 


Question 

Comparison 

Mean Difference 

95% Confidence Interval 

Lung Disease 

Yr 8/9 - Yr6/7 

1.20 

(0.32,2.08) 

Lung Disease 

Yr 10 - Yr 8/9 

0.37 

(-0.43, 1.17) 

Lung Disease 

Yr 10 - Yr 6/7 

1.57 

(0.70, 2.44) 

Indigestion 

Yr 8/9 - Yr 6/7 

1.41 

(0.42, 2.40) 

Indigestion 

Yr 10 - Yr 8/9 

-0.12 

(-0.91,0.66) 

Indigestion 

Yr 10 - Yr 6/7 

1.29 

(0.25, 2.32) 


Discussion 

The structure used for analyzing these two problems recognized two 
types of engagement with them: using the data provided in the problem or 
using other information known to the student. These elements—the engage¬ 
ment with the social context using information known to the student and the 
application of statistical skills using the data provided—are characteristic of 
increasing levels of statistical literacy (Watson & Callingham, 2003). 

The necessity of context before statistics can make sense has long been 
recognized (e.g., Rao, 1975) but in this study there is the question of wheth¬ 
er it can interfere with the statistical reasoning that takes place. Hence, it is 
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interesting to note that Code 2 responses represented 26% of responses to 
the Lung Disease problem but only 4% for the Indigestion problem, prob¬ 
ably reflecting the more common understanding in society about issues 
of lung disease and smoking than about drug usage and indigestion. The 
difference in the percentages supports the view that it is genuine interfer¬ 
ence of beliefs, rather than a desire to avoid engaging with the mathematics 
that accounts for the high value for the Lung Disease problem; otherwise 
it would be expected that the percentage of Code 2 responses would be the 
same for both problems. Code 3 arguments were the basis for 10% of the 
responses to both problems, indicating the consistency of the concern for 
methodological issues across the two problems. Except for Code 2, the per¬ 
centages for the codes were quite similar. 

The Codes 4 to 6 recognize the use of an increasing number of data val¬ 
ues in the solution. The need to account explicitly for all data to eliminate 
alternate interpretations reduced the number of appropriate decisions given 
credit compared to the analysis of Batanero et al. (1996). It is important for 
teachers to be aware of this necessity and to ask students for a complete 
justification rather than a single answer, perhaps based on one or two cells 
in the table. The move to proportional reasoning, such as the use of percents 
to make comparisons, is an important component of the middle years’ cur¬ 
riculum. The fact that nearly 30% of all students reached this level for Indi¬ 
gestion, compared with nearly one-quarter for Lung Disease, suggests that 
mathematically the Indigestion problem was slightly easier, even though 
the context appeared to be less familiar. The implication for teachers is that 
they need to develop generic statistical skills in their students that can be 
applied regardless of context. 

On one hand it might be suggested because each problem was presented 
with data in a two-way table that students would use the same method of 
solution and there would be a high correlation of scores. On the other hand, 
the existence of different types of association represented and the different 
contexts in the data of the two problems might suggest otherwise. There 
was no association of levels of response for Years 6/7 students suggest¬ 
ing there was little recognition of similarity by these students (Table 4). 
Although the correlation of levels of response was positive for Years 8/9, it 
was not significant, whereas the correlation for Year 10 was significant {p < 
.005). This may represent an increasing recognition of the similarity of the 
problems with more years of experience in mathematics classrooms. 

This extra experience in mathematics classrooms by Year 10, however, 
does not appear to have resulted in a continued increase in level of per¬ 
formance on the two problems that was observed from Years 6/7 to Years 
8/9 (Table 5). Given the presence of topics associated with percentages, 
rates, ratios, and two-way tables in about Year 8 of middle school curricula 
(e.g., ACARA, 2013; AEC, 1991; Franklin et al., 2007; National Council 
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of Teachers of Mathematics, 1989), it is perhaps not surprising that perfor¬ 
mance increases significantly from Years 6/7 but then levels off. At least it 
does not regress to a large degree. It should be noted, however, that con¬ 
tinued review of the topics associated with these problems is needed in the 
senior years of secondary school, otherwise the skills will be lost. Con¬ 
sidering complex conditional problems as suggested by Watson (2011) is 
exceedingly difficult without the basic skills and awareness of the need to 
consider all information presented in cells of two-way tables. 


Implications and Conclusions 

Although this study took place in Australia, the topic and the results are 
also relevant in other countries. The broad issues related to risk, raised for 
example by Gigerenzer (2002), are relevant to all statistically literate adults 
across the world. He provides many cases of misunderstandings, which 
could hopefully be reduced in number with attention paid to two-way tables 
in the middle and high school years. In the United States this is recognized 
in the Common Core State Standards for Mathematics (Common Core State 
Standards Initiative, 2010) in Year 8 under “Investigate patterns of associa¬ 
tion in bivariate data.” 

Understand that patterns of association can also be seen in bivariate 
categorical data by displaying frequencies and relative frequencies 
in a two-way table. Construct and interpret a two-way table sum¬ 
marizing data on two categorical variables collected from the same 
subjects. Use relative frequencies calculated for rows or columns 
to describe possible association between the two variables (p. 56). 

Further at the high school level the Common Core recommends using 
two-way tables to “decide if events are independent and to estimate condi¬ 
tional probabilities” (p. 82). These two extracts show that the Common Core 
appreciates the important role two-way tables have to play in the meeting 
of statistics and probability. Along with stressing that data are “numbers 
in context” (p. 79) the Common Core also recognizes the importance of 
proportional reasoning at every year from Year 6 (e.g., pp. 41, 46, 52, 67). 

This study has highlighted the complexity of analyzing two-way table 
problems and emphasized the importance of asking students to justify their 
answers. It has shown that the context of the problem may interfere with 
decisions due to pre-conceived ideas and that at times students can reach a 
correct conclusion without eliminating alternative possibilities. It is impor¬ 
tant for teachers to be aware of the large variation in possible answers and 
explanations in order to assist students and assess appropriately. Although 
this study was not about teachers’ pedagogical content knowledge (PCK) 
(Shulman, 1987), the understandings explored here should be part of the 
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knowledge teachers incorporate into their classroom PCK. 

It is rather surprising that more research has not been carried out in rela¬ 
tion to two-way table problems, particularly in comparing or contrasting 
associations of variables that are positively or negatively related, or inde¬ 
pendent. Further studies could look at the overall magnitude and relative 
size of the numbers in the cells of tables and whether this contributes to the 
difficulty. Also of interest is the relationship of the context chosen for the 
data presented and the likely pre-conceived ideas that students may have 
about it. Additionally, a future study investigating the levels of understand¬ 
ing of their teachers as was done by Watson and Nathan (2010) for the Lung 
Disease problem based on interviews with the teachers might be fruitful. 
There may be a need for professional learning for teachers, particularly in 
relation to inverse relationships. 
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